data term
Symmetry-Breaking Descent for Invariant Cost Functionals
We study the problem of reducing a task cost functional $W : H^s(M) \to \mathbb{R}$, not assumed continuous or differentiable, defined over Sobolev-class signals $S \in H^s(M) $, in the presence of a global symmetry group $G \subset \mathrm{Diff}(M)$. The group acts on signals by pullback, and the cost $W$ is invariant under this action. Such scenarios arise in machine learning and related optimization tasks, where performance metrics may be discontinuous or model-internal. We propose a variational method that exploits the symmetry structure to construct explicit deformations of the input signal. A deformation control field $ ϕ: M \to \mathbb R^d$, obtained by minimizing an auxiliary energy functional, induces a flow that generically lies in the normal space (with respect to the $L^2$ inner product) to the $G$-orbit of $S$, and hence is a natural candidate to cross the decision boundary of the $G $-invariant cost. We analyze two variants of the coupling term: (1) purely geometric, independent of $W$, and (2) weakly coupled to $W$. Under mild conditions, we show that symmetry-breaking deformations of the signal can reduce the cost. Our approach requires no gradient backpropagation or training labels and operates entirely at test time. It provides a principled tool for optimizing discontinuous invariant cost functionals via Lie-algebraic variational flows.
Using football to explain data terms?
I've recently had a number of people ask me about the huge range of data related terms that keep cropping up: Data, Big data, Data Analytics, Algorithms, Machine learning to name a few.... many definitions are available, but these are not always easy to put together. Below is my view of how these fit together using football! Data is a collection of stuff (excuse the technical jargon) that is thought to mean something. It could be a number, a word, a picture, anything. Data alone is not worth much as it is just a collection of numbers or facts, but with some interpretation it can tell you a lot and maybe help tell you something about what might happen in the future. Let's look at football as an example: pick up the newspaper and turn to the football section.
Combining nonparametric spatial context priors with nonparametric shape priors for dendritic spine segmentation in 2-photon microscopy images
Erdil, Ertunc, Argunsah, Ali Ozgur, Tasdizen, Tolga, Unay, Devrim, Cetin, Mujdat
Data driven segmentation is an important initial step of shape prior-based segmentation methods since it is assumed that the data term brings a curve to a plausible level so that shape and data terms can then work together to produce better segmentations. When purely data driven segmentation produces poor results, the final segmentation is generally affected adversely. One challenge faced by many existing data terms is due to the fact that they consider only pixel intensities to decide whether to assign a pixel to the foreground or to the background region. When the distributions of the foreground and background pixel intensities have significant overlap, such data terms become ineffective, as they produce uncertain results for many pixels in a test image. In such cases, using prior information about the spatial context of the object to be segmented together with the data term can bring a curve to a plausible stage, which would then serve as a good initial point to launch shape-based segmentation. In this paper, we propose a new segmentation approach that combines nonparametric context priors with a learned-intensity-based data term and nonparametric shape priors. We perform experiments for dendritic spine segmentation in both 2D and 3D 2-photon microscopy images. The experimental results demonstrate that using spatial context priors leads to significant improvements.
Using Large Ensembles of Control Variates for Variational Inference
Variational inference is increasingly being addressed with stochastic optimization. In this setting, the gradient's variance plays a crucial role in the optimization procedure, since high variance gradients lead to poor convergence. A popular approach used to reduce gradient's variance involves the use of control variates. Despite the good results obtained, control variates developed for variational inference are typically looked at in isolation. In this paper we clarify the large number of control variates that are available by giving a systematic view of how they are derived. We also present a Bayesian risk minimization framework in which the quality of a procedure for combining control variates is quantified by its effect on optimization convergence rates, which leads to a very simple combination rule. Results show that combining a large number of control variates this way significantly improves the convergence of inference over using the typical gradient estimators or a reduced number of control variates.
Using Large Ensembles of Control Variates for Variational Inference
Variational inference is increasingly being addressed with stochastic optimization. In this setting, the gradient's variance plays a crucial role in the optimization procedure, since high variance gradients lead to poor convergence. A popular approach used to reduce gradient's variance involves the use of control variates. Despite the good results obtained, control variates developed for variational inference are typically looked at in isolation. In this paper we clarify the large number of control variates that are available by giving a systematic view of how they are derived. We also present a Bayesian risk minimization framework in which the quality of a procedure for combining control variates is quantified by its effect on optimization convergence rates, which leads to a very simple combination rule. Results show that combining a large number of control variates this way significantly improves the convergence of inference over using the typical gradient estimators or a reduced number of control variates.
Using Large Ensembles of Control Variates for Variational Inference
Variational inference is increasingly being addressed with stochastic optimization. In this setting, the gradient's variance plays a crucial role in the optimization procedure, since high variance gradients lead to poor convergence. A popular approach used to reduce gradient's variance involves the use of control variates. Despite the good results obtained, control variates developed for variational inference are typically looked at in isolation. In this paper we clarify the large number of control variates that are available by giving a systematic view of how they are derived. We also present a Bayesian risk minimization framework in which the quality of a procedure for combining control variates is quantified by its effect on optimization convergence rates, which leads to a very simple combination rule. Results show that combining a large number of control variates this way significantly improves the convergence of inference over using the typical gradient estimators or a reduced number of control variates.
TRex: A Tomography Reconstruction Proximal Framework for Robust Sparse View X-Ray Applications
Aly, Mohamed, Zang, Guangming, Heidrich, Wolfgang, Wonka, Peter
We provide an overview and perform an experimental comparison between the famous iterative reconstruction methods in terms of reconstruction quality in sparse view situations. We then derive the proximal operators for the four best methods. We show the flexibility of our framework by deriving solvers for two noise models: Gaussian and Poisson; and by plugging in three powerful regularizers. We compare our framework to state of the art methods, and show superior quality on both synthetic and real datasets.
Data Compression for Learning MRF Parameters
Refaat, Khaled S. (University of California, Los Angeles) | Darwiche, Adnan (University of California, Los Angeles)
We propose a technique for decomposing and compressing the dataset in the parameter learning problem in Markov random fields. Our technique applies to incomplete datasets and exploits variables that are always observed in the given dataset. We show that our technique allows exact computation of the gradient and the likelihood, and can lead to orders-of-magnitude savings in learning time.